Out of Time: Automated Lip Sync in the Wild

نویسندگان

  • Joon Son Chung
  • Andrew Zisserman
چکیده

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video. We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video. We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Lip-Sync for 3D-Character Animation

A central task for animating computer generated characters is the synchronization of lip movements and speech signal. For real time synchronization high technical effort, which involves a face tracking system or data gloves, is needed to drive the expressions of the character. If the speech signal is already given, off-line synchronization is possible but the animator is left with a time consum...

متن کامل

Animating Lip-Sync Characters

Speech animation is traditionally considered as important but tedious work for most applications, especially when taking lip synchronization (lip-sync) into consideration, because the muscles on the face are complex and interact dynamically. Although there are several methods proposed to ease the burden on artists to create facial and speech animation, almost none are fast and efficient. In thi...

متن کامل

Automated lip-sync: Background and techniques

The problem of creating mouth animation synchronized to recorded speech is discussed. Review of a model of speech sound generation indicates that the automatic derivation of mouth movement from a speech soundtrack is a tractable problem. Several automatic lip-sync techniques are compared, and one method is described in detail. In this method a common speech synthesis method, linear prediction, ...

متن کامل

ObamaNet: Photo-realistic lip-sync from text

We present ObamaNet, the first architecture that takes any text as input and generates both the corresponding speech and synchronized photo-realistic lip-sync videos. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech n...

متن کامل

Carnival-combining speech technology and computer animation.

Speech is powerful information technology and the basis of human interaction. By emitting streams of buzzing, popping, and hissing noises from our mouths, we transmit thoughts, intentions, and knowledge of the world from one mind to another. We’re accustomed to thinking of speech as an acoustic, auditory phenomenon. However, speech is also visible. Although the primary function of speech is to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016